An Adaptive Sampling Algorithm for Solving Markov Decision Processes

نویسندگان

  • Hyeong Soo Chang
  • Michael C. Fu
  • Jiaqiao Hu
  • Steven I. Marcus
چکیده

Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate O ( H ln N N ) , where H is the horizon length and N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O((|A|N)H), independent of the state space size, where |A| is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set

Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Mini/Micro-Grid Adaptive Voltage and Frequency Stability Enhancement Using Q-learning Mechanism

This paper develops an adaptive control method for controlling frequency and voltage of an islanded mini/micro grid (M/µG) using reinforcement learning method. Reinforcement learning (RL) is one of the branches of the machine learning, which is the main solution method of Markov decision process (MDPs). Among the several solution methods of RL, the Q-learning method is used for solving RL in th...

متن کامل

Optimal Importance Sampling in Markov Process Simulation

In this paper we present an adaptive algorithm to estimate the transient blocking probability of a communication system, described by a Markov process, during a finite time interval starting from a given state. The method uses importance sampling for variance reduction and adjusts the parameters of the twisted distribution based on earlier samples. The method can be effectively applied to a dec...

متن کامل

Randomized Search Methods for Solving Markov Decision Processes and Global Optimization

Title of dissertation: RANDOMIZED SEARCH METHODS FOR SOLVING MARKOV DECISION PROCESSES AND GLOBAL OPTIMIZATION Jiaqiao Hu, Doctor of Philosophy, 2006 Dissertation directed by: Professor Steven I. Marcus Department of Electrical and Computer Engineering Professor Michael C. Fu Department of Decision and Information Technology Markov decision process (MDP) models provide a unified framework for m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Operations Research

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2005